EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
نویسندگان
چکیده
Learning from imbalanced datasets is highly demanded in real-world applications and a challenge for standard classifiers that tend to be biased towards the classes with majority of examples. Undersampling approaches reduce size class balance distributions. Evolutionary-based are prominent, treating undersampling as binary optimisation problem determines which examples removed. However, their utilisation limited small due fitness evaluation costs. This work proposes two-stage clustering-based surrogate model enables evolutionary compute values faster. The main novelty lies development based on meaning (phenotype) rather than representation (genotype). We conduct an 44 datasets, showing comparison original undersampling, we can save up 83% runtime without significantly deteriorating classification performance.
منابع مشابه
Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
Learning with imbalanced data is one of the recent challenges in machine learning. Various solutions have been proposed in order to find a treatment for this problem, such as modifying methods or the application of a preprocessing stage. Within the preprocessing focused on balancing data, two tendencies exist: reduce the set of examples (undersampling) or replicate minority class examples (over...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملOn Model-Based Clustering, Classification, and Discriminant Analysis
The use of mixture models for clustering and classification has burgeoned into an important subfield of multivariate analysis. These approaches have been around for a half-century or so, with significant activity in the area over the past decade. The primary focus of this paper is to review work in model-based clustering, classification, and discriminant analysis, with particular attenti...
متن کاملAddressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
In the classification framework there are problems in which the number of examples per class is not equitably distributed, formerly known as imbalanced data sets. This situation is a handicap when trying to identify the minority classes, as the learning algorithms are not usually adapted to such characteristics. An usual approach to deal with the problem of imbalanced data sets is the use of a ...
متن کاملEvolutionary-based selection of generalized instances for imbalanced classification
In supervised classification, we often encounter many real world problems in which the data do not have an equitable distribution among the different classes of the problem. In such cases, we are dealing with the so-called imbalanced data sets. One of the most used techniques to deal with this problem consists of preprocessing the data previously to the learning process. This paper proposes a m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied Soft Computing
سال: 2021
ISSN: ['1568-4946', '1872-9681']
DOI: https://doi.org/10.1016/j.asoc.2020.107033